AITopics | model quality

Collaborating Authors

model quality

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

Neural Information Processing SystemsFeb-18-2026, 04:42:42 GMT

We leverage this unique capability to propose NoMAD-Attention, an efficient attention algorithm that replaces MAD operations with in-register lookups.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Harris County > Houston (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

IWBVT: Instance Weighting-based Bias-Variance Trade-off for Crowdsourcing

Neural Information Processing SystemsFeb-16-2026, 22:49:07 GMT

In recent years, a large number of algorithms for label integration and noise correction have been proposed to infer the unknown true labels of instances in crowdsourcing.

artificial intelligence, machine learning, social media, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
North America > Canada (0.04)
(8 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
Information Technology > Communications > Social Media > Crowdsourcing (0.87)
Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

KVCacheis1BitPerChannel: EfficientLarge LanguageModelInferencewithCoupledQuantization

Neural Information Processing SystemsFeb-7-2026, 10:04:28 GMT

Furthermore, we demonstrate that CQ can preservemodel quality reasonably with KV cache quantizeddownto1bit.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.30)

Add feedback

Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees

Neural Information Processing SystemsDec-24-2025, 13:06:01 GMT

Communication compression is a crucial technique for modern distributed learning systems to alleviate their communication bottlenecks over slower networks. Despite recent intensive studies of gradient compression for data parallel-style training, compressing the activations for models trained with pipeline parallelism is still an open problem. In this paper, we propose AQ-SGD, a novel activation compression algorithm for communication-efficient pipeline parallelism training over slow networks.

activation quantization, compression, fine-tuning language model, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

Add feedback

First Attentions Last: Better Exploiting First Attentions for Efficient Transformer Training

Kim, Gyudong, Na, Hyukju, Kim, Jin Hyeon, Jang, Hyunsung, Park, Jaemin, Hwang, Jaegi, Ha, Namkoo, Kim, Seungryong, Kim, Young Geun

arXiv.org Artificial IntelligenceDec-9-2025

As training billion-scale transformers becomes increasingly common, employing multiple distributed GPUs along with parallel training methods has become a standard practice. However, existing transformer designs suffer from significant communication overhead, especially in Tensor Parallelism (TP), where each block's MHA-MLP connection requires an all-reduce communication. Through our investigation, we show that the MHA-MLP connections can be bypassed for efficiency, while the attention output of the first layer can serve as an alternative signal for the bypassed connection. Motivated by the observations, we propose FAL (First Attentions Last), an efficient transformer architecture that redirects the first MHA output to the MLP inputs of the following layers, eliminating the per-block MHA-MLP connections. This removes the all-reduce communication and enables parallel execution of MHA and MLP on a single GPU. We also introduce FAL+, which adds the normalized first attention output to the MHA outputs of the following layers to augment the MLP input for the model quality. Our evaluation shows that FAL reduces multi-GPU training time by up to 44%, improves single-GPU throughput by up to 1.18x, and achieves better perplexity compared to the baseline GPT. FAL+ achieves even lower perplexity without increasing the training time than the baseline. Codes are available at: https://github.com/CASL-KU/FAL

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.14614

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Matryoshka Model Learning for Improved Elastic Student Models

Verma, Chetan, Timmaraju, Aditya Srinivas, Hsieh, Cho-Jui, Damle, Suyash, Bui, Ngot, Zhang, Yang, Chen, Wen, Liu, Xin, Jain, Prateek, Dhillon, Inderjit S

arXiv.org Artificial IntelligenceDec-3-2025

Industry-grade ML models are carefully designed to meet rapidly evolving serving constraints, which requires significant resources for model development. In this paper, we propose MatTA, a framework for training multiple accurate Student models using a novel Teacher-TA-Student recipe. TA models are larger versions of the Student models with higher capacity, and thus allow Student models to better relate to the Teacher model and also bring in more domain-specific expertise. Furthermore, multiple accurate Student models can be extracted from the TA model. Therefore, despite only one training run, our methodology provides multiple servable options to trade off accuracy for lower serving cost. We demonstrate the proposed method, MatTA, on proprietary datasets and models. Its practical efficacy is underscored by live A/B tests within a production ML system, demonstrating 20% improvement on a key metric. We also demonstrate our method on GPT-2 Medium, a public model, and achieve relative improvements of over 24% on SAT Math and over 10% on the LAMBADA benchmark.

large language model, machine learning, student model, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3711896.3737245

2505.23337

Country: North America > United States > California (0.28)

Genre: Research Report (0.82)

Industry:

Education > Educational Technology > Educational Software (1.00)
Education > Educational Setting (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

D-com: Accelerating Iterative Processing to Enable Low-rank Decomposition of Activations

Tahmasebi, Faraz, Pelluer, Michael, Kwon, Hyoukjun

arXiv.org Artificial IntelligenceOct-16-2025

The computation and memory costs of large language models kept increasing over last decade, which reached over the scale of 1T parameters. To address the challenges from the large scale models, model compression techniques such as low-rank decomposition have been explored. Previous model decomposition works have focused on weight decomposition to avoid costly runtime decomposition, whose latency often significantly exceeds the benefits from decomposition (e.g., 38% more end-to-end latency when running Llama2-7b on A100 with 4K sequence length with activation decomposition compared to no decomposition). In this work, we debunk such observations and report that the input decomposition can be significantly beneficial with a proper choice of decomposition algorithm and hardware support. We adopt progressive decomposition algorithm, Lanczos algorithm, and design a co-accelerator architecture for the decomposition algorithm. To address the memory- boundness of the decomposition operation, we introduce a novel compute replication methodology that moves the op- eration toward compute-bound region, which enables 6.2x speedup in our evaluation. We also develop an output shape- preserving computation scheme that eliminates decomposi- tion costs in consecutive layers. To compensate model quality loss from compression, we introduce a multi-track decom- position approach that separately handles outlier channels for high accuracy and low perplexity with minimal compu- tational costs. Combined together, our accelerator, D-com, provides 22% end-to-end latency improvements compared to A100 GPU at the cost of small model quality degradation (e.g., 3% on AI2 Reasoning Challenge task).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.13147

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

Neural Information Processing SystemsOct-10-2025, 16:49:52 GMT

We leverage this unique capability to propose NoMAD-Attention, an efficient attention algorithm that replaces MAD operations with in-register lookups.

arxiv preprint arxiv, nomad-attention, sub, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Harris County > Houston (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

IWBVT: Instance Weighting-based Bias-Variance Trade-off for Crowdsourcing

Neural Information Processing SystemsOct-10-2025, 11:08:34 GMT

In recent years, a large number of algorithms for label integration and noise correction have been proposed to infer the unknown true labels of instances in crowdsourcing.

algorithm, dataset, model quality, (14 more...)

Neural Information Processing Systems

Country: